fix: expand contact point hostnames to all DNS IPs at connection time (DRIVER-201) — Part 1/2#889
Draft
nikagra wants to merge 3 commits into
Draft
Conversation
… (DRIVER-201) Addresses the initial-contact-endpoints aspect of DRIVER-201. Problem: with RESOLVE_CONTACT_POINTS=false (the default), a contact point hostname was stored as a single unresolved InetSocketAddress. At connection time the load-balancing query plan contained exactly one Node per hostname, so only the first IP returned by DNS was ever tried. If that IP was non-responsive the driver raised AllNodesFailedException with no fallback to other IPs the hostname might resolve to. Solution (per @dkropachev's architectural direction): - Deprecate DefaultDriverOption.RESOLVE_CONTACT_POINTS. Contact points are now always kept as unresolved hostnames (resolve=false is hardcoded in SessionBuilder), deferring DNS expansion to connection time. - Add MetadataManager.getResolvedContactPoints(): for each contact point backed by an unresolved hostname it calls InetAddress.getAllByName() to expand the hostname to all known IPs, creating a synthetic DefaultNode for each IP. Already-resolved or non-InetSocketAddress endpoints pass through unchanged. - LoadBalancingPolicyWrapper now calls getResolvedContactPoints() instead of getContactPoints() in newQueryPlan() (BEFORE/DURING_INIT states) and newControlReconnectionQueryPlan(), so the query plan contains one node per resolved IP and the driver naturally falls back to the next IP when one is unreachable. Tests: - 4 new MetadataManagerTest cases covering null state, already-resolved passthrough, single-hostname expansion, and multi-endpoint expansion. - LoadBalancingPolicyWrapperTest updated to stub getResolvedContactPoints(). - New MockResolverIT.should_connect_when_first_dns_entry_is_non_responsive integration test: first DNS entry is a non-existent IP, session must open successfully against the remaining real IPs.
This was referenced May 15, 2026
…VER-201) newControlReconnectionQueryPlan() now creates copies of the original contact-point nodes (with their unresolved hostname endpoints) instead of synthetic nodes with resolved IPs. This ensures the control channel carries the hostname endpoint, which is preserved in metadata after topology refresh. DNS expansion for connection fallback is handled by ChannelFactory (PR scylladb#890), so the control-reconnection path does not need to inject resolved-IP nodes into the query plan. Also adds getContactPoints() stub back to LoadBalancingPolicyWrapperTest so tests that cover the control-reconnect path continue to pass.
Before-init query plan now uses getContactPoints() (original unresolved hostname nodes) instead of getResolvedContactPoints(). The DNS expansion to all IPs happens at the ChannelFactory level (PR scylladb#890), so expanding here was redundant and broke should_connect_with_mocked_hostname by replacing hostname endpoints with resolved-IP endpoints. Also remove the should_connect_when_first_dns_entry_is_non_responsive integration test from this PR; it belongs in PR scylladb#890 where ChannelFactory expansion actually enables it to pass.
There was a problem hiding this comment.
Pull request overview
This PR aims to address DRIVER-201 by ensuring a hostname contact point can be tried via all DNS-resolved IPs (instead of only the first), and begins deprecating RESOLVE_CONTACT_POINTS as a no-op to preserve unresolved hostnames until connection time.
Changes:
- Deprecates
RESOLVE_CONTACT_POINTSinDefaultDriverOption/TypedDriverOptionand stops reading it inSessionBuilder(always merges contact points withresolve=false). - Adds
MetadataManager#getResolvedContactPoints()plus unit tests that validate DNS expansion / passthrough behavior. - Updates comments/tests around contact-point handling in load-balancing wrapper tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/main/java/com/datastax/oss/driver/internal/core/metadata/MetadataManager.java | Adds getResolvedContactPoints() that expands unresolved hostname contact points to all DNS IPs. |
| core/src/main/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapper.java | Adjusts comments around contact point handling (but currently still uses getContactPoints() in query plan construction). |
| core/src/main/java/com/datastax/oss/driver/api/core/session/SessionBuilder.java | Stops honoring RESOLVE_CONTACT_POINTS; always stores contact points as unresolved. |
| core/src/main/java/com/datastax/oss/driver/api/core/config/TypedDriverOption.java | Marks RESOLVE_CONTACT_POINTS deprecated with updated Javadoc. |
| core/src/main/java/com/datastax/oss/driver/api/core/config/DefaultDriverOption.java | Marks RESOLVE_CONTACT_POINTS deprecated with updated Javadoc. |
| core/src/test/java/com/datastax/oss/driver/internal/core/metadata/MetadataManagerTest.java | Adds tests for getResolvedContactPoints() behavior (null state, passthrough, DNS expansion). |
| core/src/test/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapperTest.java | Updates an assertion comment related to appended contact points. |
Comments suppressed due to low confidence (1)
core/src/main/java/com/datastax/oss/driver/internal/core/metadata/LoadBalancingPolicyWrapper.java:176
- The new comment says DNS expansion is handled by ChannelFactory at connection time, but ChannelFactory.connect() currently just calls endPoint.resolve() once and bootstrap.connect(resolvedAddress), which won’t iterate over all A/AAAA records for an unresolved InetSocketAddress. Either update this logic to actually use getResolvedContactPoints() here, or implement/point to the connection-time DNS expansion in ChannelFactory; as-is, the comment (and intended fallback behavior) is incorrect.
// Use the original (potentially unresolved) contact-point endpoints so that the control
// connection channel retains the hostname, preserving hostname-based node identity in
// metadata. DNS expansion to all IPs for each hostname is handled by ChannelFactory at
// actual connection time.
Set<DefaultNode> originalNodes = context.getMetadataManager().getContactPoints();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| case BEFORE_INIT: | ||
| case DURING_INIT: | ||
| // The contact points are not stored in the metadata yet: | ||
| List<Node> nodes = new ArrayList<>(context.getMetadataManager().getContactPoints()); |
Comment on lines
+938
to
+941
| // RESOLVE_CONTACT_POINTS is deprecated: contact points are always kept as unresolved | ||
| // hostnames and expanded to all their DNS IPs lazily at connection time. | ||
| Set<EndPoint> contactPoints = | ||
| ContactPoints.merge(programmaticContactPoints, configContactPoints, resolveAddresses); | ||
| ContactPoints.merge(programmaticContactPoints, configContactPoints, false); |
| assertThat(queryPlan.poll()).isEqualTo(node1); | ||
| // Remaining nodes are contact points appended at the end. | ||
| // They are new DefaultNode instances created via newContactPoint, so compare by endpoint. | ||
| // Remaining nodes are the resolved contact points appended at the end. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When
RESOLVE_CONTACT_POINTS=false(the default), a contact point hostname was stored as a single unresolvedInetSocketAddress. At connection time the load-balancing query plan contained exactly oneNodeper hostname, so only the first IP returned by DNS was ever tried. If that IP was non-responsive the driver raisedAllNodesFailedExceptionwith no fallback to other IPs the hostname might resolve to.This is particularly impactful in dynamic DNS environments where a hostname maps to multiple nodes and the first one may be temporarily unavailable.
Fixes DRIVER-201.
Changes
DefaultDriverOption/TypedDriverOptionRESOLVE_CONTACT_POINTSis now@Deprecated. Contact points are always kept as unresolved hostnames; the option is a no-op.SessionBuilderRESOLVE_CONTACT_POINTSread;ContactPoints.merge()is now always called withresolve=false, deferring all DNS expansion to connection time.MetadataManagergetResolvedContactPoints()method: for each contact point backed by an unresolved hostname, callsInetAddress.getAllByName()to obtain all IPs and creates a syntheticDefaultNodefor each. Already-resolved and non-InetSocketAddressendpoints are returned as-is.LoadBalancingPolicyWrappernewQueryPlan()(inBEFORE_INIT/DURING_INITstates) andnewControlReconnectionQueryPlan()now callgetResolvedContactPoints()instead ofgetContactPoints(), so the query plan contains one entry per resolved IP. The driver naturally falls back to the next candidate if one is unreachable.Tests
MetadataManagerTestcases covering: null state, already-resolved passthrough, single-hostname DNS expansion, multi-endpoint expansion.LoadBalancingPolicyWrapperTestupdated to stubgetResolvedContactPoints().MockResolverIT.should_connect_when_first_dns_entry_is_non_responsive: 2-node CCM cluster on127.0.1.x; first DNS entry points to a non-existent IP (127.0.1.11); asserts session opens successfully against the real nodes.